15 research outputs found

    Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species

    Get PDF
    Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/

    Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species

    Get PDF
    Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/.info:eu-repo/semantics/publishedVersio

    There and back again: historical perspective and future directions for Vaccinium breeding and research studies

    Get PDF
    The genus Vaccinium L. (Ericaceae) contains a wide diversity of culturally and economically important berry crop species. Consumer demand and scientific research in blueberry (Vaccinium spp.) and cranberry (Vaccinium macrocarpon) have increased worldwide over the crops' relatively short domestication history (~100 years). Other species, including bilberry (Vaccinium myrtillus), lingonberry (Vaccinium vitis-idaea), and ohelo berry (Vaccinium reticulatum) are largely still harvested from the wild but with crop improvement efforts underway. Here, we present a review article on these Vaccinium berry crops on topics that span taxonomy to genetics and genomics to breeding. We highlight the accomplishments made thus far for each of these crops, along their journey from the wild, and propose research areas and questions that will require investments by the community over the coming decades to guide future crop improvement efforts. New tools and resources are needed to underpin the development of superior cultivars that are not only more resilient to various environmental stresses and higher yielding, but also produce fruit that continue to meet a variety of consumer preferences, including fruit quality and health related trait

    COVID-19 symptoms at hospital admission vary with age and sex: results from the ISARIC prospective multinational observational study

    Get PDF
    Background: The ISARIC prospective multinational observational study is the largest cohort of hospitalized patients with COVID-19. We present relationships of age, sex, and nationality to presenting symptoms. Methods: International, prospective observational study of 60 109 hospitalized symptomatic patients with laboratory-confirmed COVID-19 recruited from 43 countries between 30 January and 3 August 2020. Logistic regression was performed to evaluate relationships of age and sex to published COVID-19 case definitions and the most commonly reported symptoms. Results: ‘Typical’ symptoms of fever (69%), cough (68%) and shortness of breath (66%) were the most commonly reported. 92% of patients experienced at least one of these. Prevalence of typical symptoms was greatest in 30- to 60-year-olds (respectively 80, 79, 69%; at least one 95%). They were reported less frequently in children (≀ 18 years: 69, 48, 23; 85%), older adults (≄ 70 years: 61, 62, 65; 90%), and women (66, 66, 64; 90%; vs. men 71, 70, 67; 93%, each P < 0.001). The most common atypical presentations under 60 years of age were nausea and vomiting and abdominal pain, and over 60 years was confusion. Regression models showed significant differences in symptoms with sex, age and country. Interpretation: This international collaboration has allowed us to report reliable symptom data from the largest cohort of patients admitted to hospital with COVID-19. Adults over 60 and children admitted to hospital with COVID-19 are less likely to present with typical symptoms. Nausea and vomiting are common atypical presentations under 30 years. Confusion is a frequent atypical presentation of COVID-19 in adults over 60 years. Women are less likely to experience typical symptoms than men

    Data from: Accurate genomic prediction of Coffea canephora in multiple environments using whole-genome statistical models

    No full text
    Genomic selection have been proposed as the standard method to predict breeding values in animal and plant breeding. Although some crops have benefited from this methodology, studies in Coffea are still emerging. To date, there have been no studies of how well genomic prediction models work across populations and environments for different complex traits in coffee. Considering that predictive models are based on biological and statistical assumptions, it is expected that their performance vary depending on how well these assumptions align with the true genetic architecture of the phenotype. To investigate this, we used data from two recurrent selection populations of Coffea canephora, evaluated in two locations, and single nucleotide polymorphisms identified by Genotyping-by-Sequencing. In particular, we evaluated the performance of 13 statistical approaches to predict three important traits in the coffee — production of coffee beans, leaf rust incidence and yield of green beans. Analyses were performed for predictions within-environment, across locations and across populations to assess the reliability of genomic selection. Overall, differences in the prediction accuracy of the competing models were small, although the Bayesian methods showed a modest improvement over other methods, at the cost of more computation time. As expected, predictive accuracy for within-environment analysis, on average, were higher than predictions across locations and across populations. Our results support the potential of genomic selection to reshape traditional plant breeding schemes. In practice, we expect to increase the genetic gain per unit of time by reducing the length cycle of recurrent selection in coffee

    DivergĂȘncia genĂ©tica entre genĂłtipos de pimenta com base em caracteres morfo-agrĂŽnomicos

    No full text
    O gĂȘnero Capsicum compreende um grupo altamente diversificado de pimentas e pimentĂ”es constituĂ­do por grande nĂșmero de espĂ©cies. A caracterização dos materiais existentes quanto Ă  divergĂȘncia genĂ©tica torna-se de importĂąncia fundamental visando trabalhos de melhoramento. TĂ©cnicas multivariadas foram utilizadas para avaliar a divergĂȘncia genĂ©tica entre 34 subamostras da coleção de germoplasma de Capsicum baccatum da Universidade Federal de Viçosa. Foram utilizados cinco descritores quantitativos propostos pelo International Plant Genetic Resources Institute, em um experimento conduzido em condiçÔes de campo, em Viçosa-MG, no delineamento de blocos ao acaso. A divergĂȘncia genĂ©tica entre os tratamentos foi determinada pelas tĂ©cnicas multivariadas, baseadas na anĂĄlise de agrupamento e de variĂĄveis canĂŽnicas. As variĂĄveis analisadas foram matĂ©ria fresca do fruto, comprimento do fruto, espessura do pericarpo, nĂșmero de sementes por frutos e teor de sĂłlidos solĂșveis. Houve diferença significativa entre as subamostras para todos os descritores avaliados. Observou-se concordĂąncia entre as tĂ©cnicas multivariadas utilizadas e foi possĂ­vel separar as subamostras em cinco grupos distintos. As subamostras BGH 1739 e BGH 1646 se destacaram apresentando bom potencial para uso em programas de melhoramento, visando Ă  obtenção de bons materiais para consumo in natura ou para industrialização.The genus Capsicum comprises a varied group of hot and sweet peppers with a large number of species. The characterization of materials for genetic divergence becomes of paramount importance for breeding programs. Multivariate techniques were used to evaluated the genetic divergence among 34 sub-samples of Capsicum baccatum peppers from the Horticultural Germplasm Bank from the Federal University of Viçosa. Five quantitative descriptors proposed by International Plant Genetic Resources Institute were utilized in a field experiment carried out in Viçosa, Minas Gerais State, Brazil, in a randomized block design. The genetic divergence among the sub-samples was determined by cluster analysis and canonical variables. The variables fruit average weight, fruit length, fruit diameter, number of seeds per fruit and content of soluble solids were evaluated. There was significant difference among sub-samples for all descriptors evaluated. General agreement among all multivariate techniques used in the work was observed and it was possible to separate the sub-samples in five distinct groups. The sub-samples BGH 1739 and BGH 1646 stood out showing good potential for use in breeding programs aiming to produce good materials for fresh consumption or processing purposes

    SNPs and phenotypes in Coffea canephora population

    No full text
    This dataset contains the genotypic and phenotypic information used in the whole-genomic statistical models for Coffea canephora. In the original manuscript, two populations of recurrent selection were genotyped and three traits - coffee bean production, incidence of rust and yield of green beans - were evaluated in two locations. Genotypic data (SNPs identified using the Genotyping-by-Sequencing approach) for both populations are stored in two .csv files (-1,0,1 format). Phenotype data for both populations are also stored in two .csv files. The values stored in the phenotypic files are phenotypes adjusted for linear effects of environmental covariates and other experimental covariates

    Data from: Insights into the genetic basis of blueberry fruit-related traits using diploid and polyploid models in a GWAS context

    No full text
    Polyploidization is an ancient and recurrent process in plant evolution, impacting the diversification of natural populations and plant breeding strategies. Polyploidization occurs in many important crops; however, its effects on inheritance of many agronomic traits are still poorly understood compared with diploid species. Higher levels of allelic dosage or more complex interactions between alleles could affect the phenotype expression. Hence, the present study aimed to dissect the genetic basis of fruit-related traits in autotetraploid blueberries and identify candidate genes affecting phenotypic variation. We performed a genome-wide association study (GWAS) assuming diploid and tetraploid inheritance, encompassing distinct models of gene action (additive, general, different orders of allelic interaction and the corresponding diploidized models). A total of 1,575 southern highbush blueberry individuals from a breeding population of 117 full-sib families were genotyped using sequence capture and next-generation sequencing, and evaluated for eight fruit-related traits. For the diploid allele calling, 77,496 SNPs were detected; while 80,591 SNPs were obtained in tetraploid, with a high degree of overlap (95%) between them. A linear mixed model that accounted for population and family structure was used for the GWAS analyses. By modeling tetraploid genotypes, we detected 15 SNPs significantly associated with five fruit-related traits. Alternatively, seven significant SNPs were detected for only two traits using diploid genotypes, with two SNPs overlapping with the tetraploid scenario. Our results showed that the importance of tetraploid models varied by trait and that the use of diploid models has hindered the detection of SNP-trait associations and, consequently, the genetic architecture of some commercially important traits in autotetraploid species. Furthermore, 14 SNPs co-localized with candidate genes, five of which lead to non-synonymous amino acid changes. The potential functional significance of these SNPs is discussed

    Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species

    No full text
    Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/

    Genomic Prediction of Autotetraploids; Influence of Relationship Matrices, Allele Dosage, and Continuous Genotyping Calls in Phenotype Prediction

    No full text
    Estimation of allele dosage, using genomic data, in autopolyploids is challenging and current methods often result in the misclassification of genotypes. Some progress has been made when using SNP arrays, but the major challenge is when using next generation sequencing data. Here we compare the use of read depth as continuous parameterization with ploidy parameterizations in the context of genomic selection (GS). Additionally, different sources of information to build relationship matrices were compared. A real breeding population of the autotetraploid species blueberry (Vaccinium corybosum), composed of 1,847 individuals was phenotyped for eight yield and fruit quality traits over two years. Continuous genotypic based models performed as well as the best models. This approach also reduces the computational time and avoids problems associated with misclassification of genotypic classes when assigning dosage in polyploid species. This approach could be very valuable for species with higher ploidy levels or for emerging crops where ploidy is not well understood. To our knowledge, this work constitutes the first study of genomic selection in blueberry. Accuracies are encouraging for application of GS for blueberry breeding. GS could reduce the time for cultivar release by three years, increasing the genetic gain per cycle by 86% on average when compared to phenotypic selection, and 32% when compared with pedigree-based selection. Finally, the genotypic and phenotypic data used in this study are made available for comparative analysis of dosage calling and genomic selection prediction models in the context of autopolyploids
    corecore